Skip to content

Conversation

silverweed
Copy link
Contributor

First PR of a series to merge the RNTuple Attributes into master. The final result will be this, although the commits will be reorganized to be more coherent and reviewable.

This first PR updates the binary format specification (introducing a new minor version) and updates the Serializer and Descriptor code to match. This is backward-compatible and no Attribute can be written yet since the writer API will be introduced later.

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Copy link

github-actions bot commented Sep 16, 2025

Test Results

    21 files      21 suites   3d 21h 11m 14s ⏱️
 3 674 tests  3 666 ✅  0 💤 8 ❌
75 342 runs  75 323 ✅ 10 💤 9 ❌

For more details on these failures, see this check.

Results for commit 024401a.

♻️ This comment has been updated with latest results.

@silverweed silverweed added the clean build Ask CI to do non-incremental build on PR label Sep 16, 2025
@silverweed silverweed closed this Sep 16, 2025
@silverweed silverweed reopened this Sep 16, 2025
Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle looks good to me. Some comments.

@silverweed silverweed force-pushed the ntuple_attr_1 branch 2 times, most recently from b0449c0 to 48b72ff Compare September 18, 2025 08:45
@silverweed
Copy link
Contributor Author

I updated the PR and uniformed the RNTupleAttrSetDescriptor with the other descriptor classes.

@silverweed silverweed requested a review from jblomer September 18, 2025 08:46
Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Some minor comments.

Since this is changing the binary format, let's perhaps get a second approval.

Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments on the spec additions and the iterator implementations

1. it cannot have linked Attribute RNTuples itself;
2. the Alias columns sections, both in its header and footer, must be empty (i.e. none of the Attribute Set RNTuple's
Fields can be Projected Fields);
3. none of its fields may have a structural role of 0x04 (i.e. it must not contain a ROOT streamer object);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a choice or a technical limitation?

Copy link
Contributor Author

@silverweed silverweed Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a choice (see other answer)

An Attribute Set RNTuple has a number of restrictions compared to a regular RNTuple:

1. it cannot have linked Attribute RNTuples itself;
2. the Alias columns sections, both in its header and footer, must be empty (i.e. none of the Attribute Set RNTuple's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there (somewhere else) a more detailed explanation/reason for this limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that we want to keep the Model of the attribute sets "as simple as possible" by removing certain advanced functionalities that we don't see as useful in the context of metadata. We may relax these restrictions in the future should the need arise.


### Attribute Schema Version
Each Attribute Set is created with a user-defined Model. This Model is not used directly by the underlying Attribute
Set RNTuple, but it is augmented with internal fields used to store additional data that serve to associate each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When inspecting the Attribute Set RNTuple will the additional fields be exposed or hidden? (And If they are exposed would a user be able to distinguish the implicit vs the explicit part of the model?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be hidden: the user doesn't have access to the inner model but only to the fields they defined on the user model.

if (fnBufSizeLeft() < static_cast<int>(sizeof(std::uint64_t)))
return R__FAIL("record frame too short");
std::uint16_t vMajor, vMinor;
bytes += DeserializeUInt16(bytes, vMajor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a:

if (vMajor > 1)
	throw;

and/or did I misunderstand:

A change in Major version number indicates a breaking, non-forward-compatible change in the schema: readers should
refuse reading an Attribute Set whose Major Schema Version is unknown.

and/or should that statement be weakened?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should add that check, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I'll add it in a later PR when I define a constant with the current major version

Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle! Just some additional suggestions to the spec to improve readability and consistency.


## Linked Attribute Sets

An RNTuple may have zero or more linked Attribute Sets, containing metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An RNTuple may have zero or more linked Attribute Sets, containing metadata.
An RNTuple may have zero or more linked Attribute Sets, containing user-defined metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily true, as some metadata may be automatically defined (e.g. ROOT's internal attributes). It is anyway not relevant for the purposes of specification

Copy link
Contributor

@enirolf enirolf Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is relevant to distinguish between the type of metadata the attributes store and the other kind of metadata (i.e. the ones in the header and footer) that is mentioned throughout this spec. It's true that "user-defined" may not be complete enough, how about something like "entry metadata", "metadata over (ranges of) entries" or "metadata related to the stored contents" instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; I'll think of an appropriate phrasing.

Comment on lines +816 to +834
An attribute set record frame has the following contents:
```
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Schema Version Major | Schema Version Minor |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attribute Anchor Uncompressed Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
```

- The first 32 bits contain the _Attribute Schema Version_. This is split into a _Major_ (16 LSB) and a
_Minor_ (16 MSB) version. The Schema Version is described below;
- a 32-bit unsigned integer follows, containing the uncompressed size of the Attribute Anchor.

These fields are followed by:

- a locator pointing to the Attribute RNTuple's anchor;
- a string containing the Attribute Set's name. All linked Attribute Sets must have a non-empty, distinct name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not being clear: I think this part must stay in the "Footer Envelope" section. The remaining restrictions and explanations should stay in this new section, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean build Ask CI to do non-incremental build on PR in:RNTuple
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants